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Transposable elements (TEs) are known to Influence the regulation of neighboring genes through a variety of mechanisms. 
Additionally, it was recently discovered that TEs can regulate non-neighboring genes through the trans-acting nature 
of small interfering RNAs (siRNAs). When the epigenetic repression of TEs is lost, TEs become transcriptionally active, 
and the host cell acts to repress mutagenic transposition by degrading TE mRNAs into siRNAs. In this study, we have 
performed a genome-wide analysis in the model plant Arabidopsis tlialiana and found that TE siRNA-based regulation 
of genie mRNAs is more pervasive than the two formerly characterized proof-of-principle examples. We identified 27 
candidate genie mRNAs that do not contain a TE fragment but are regulated through partial complementarity by the 
accumulation of TE siRNAs and are therefore influenced by TE epigenetic activation. We have experimentally confirmed 
several gene targets and demonstrated that they respond to the accumulation of specific 21 nucleotide TE siRNAs that 
are incorporated into the Arabidopsis Argonautel protein. Additionally, we found that one TE siRNA specifically targets 
and inhibits the formation of a host protein that acts to repress TE activity, suggesting that TEs harbor and potentially 
evolutionarily select short sequences to act as suppressors of host TE repression. 



Introduction 

Gene regulation occurs in plants and animals through the 
microRNA pathway, which is dependent upon the ability of 
an Argonaute family protein to bind small RNAs and guide 
the regulation of target mRNA transcripts (reviewed in ref 1). 
In the model plant Arabidopsis thaliana, the protein Dicer- 
likel (DCLl) processes stem-loop microRNA primary tran- 
scripts, and the resulting 21-22 nucleotide (nt) microRNAs are 
incorporated into the Argonautel (AGOl) protein. When a 
microRNA guides AGOl to a fully or partially complementary 
target mRNA transcript, the result is either mRNA transcript 
cleavage or translational inhibition, both resulting in reduced 
protein production.'' ' 

Arabidopsis AGOl is also involved in the regulation of gene 
expression through the plant-specific /Ttfw-acting small interfer- 
ing RNA (tasiRNA) pathway.*" In this pathway, a microRNA- 
loaded AGOl (or AG07) protein cleaves a non-protein coding 
transcript, which is then targeted for double-stranded RNA 
(dsRNA) production by the RNA-dependent RNA Polymerase6 
(RDR6) protein. This dsRNA is cleaved by DCL2 and DCL4 to 
produce multiple 21 nt tasiRNAs that are each loaded into AGOl 
and can target complementary or partially complementary pro- 
tein-coding mRNAs (reviewed in ref. 7). Gene regulation by the 
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tasiRNA pathway is dependent on AGOl activity and is neces- 
sary for proper cell type specification and development.'* '" 

In contrast to the roles of AGOl in gene regulation, the 
other Argonaute proteins (Arabidopsis has ten Argonaute family 
proteins) AG04, AG06, and possibly AG09 (but not AGOl), 
function to keep repetitive regions of the Arabidopsis genome 
in a transcriptionally silenced state. These other AGO pro- 
teins function in RNA-directed DNA methylation (RdDM) to 
establish and maintain transposable element (TE) silencing."'" 
Virtually all TEs in the plant body of Arabidopsis are epigeneti- 
cally silenced, and the small RNAs produced from TEs by the 
RdDM pathway are 24 nt in length, which is larger than the 
21—22 nt size of microRNAs and tasiRNAs that are incorporated 
into AGOl and function to regulate genes''' (reviewed in ref 15). 
Therefore, it has long been assumed that AGOl does not func- 
tion at TE loci or incorporate TE-derived small RNAs. However, 
it was recently discovered that upon TE epigenetic activation and 
RNA Polymerase II (Pol II) transcription, an endogenous RNAi 
pathway degrades TE mRNAs in an AGOl-dependent fashion."' 
In this endogenous RNAi pathway, RDR6, DCL2, and DCL4 
act to degrade Pol ITderived TE transcripts into 21—22 nt small 
interfering RNAs (siRNAs) that are incorporated into AGOl, 
similar to the tasiRNA gene-regulatory pathway mechanism. 
Thus, AGOl does incorporate TE siRNAs; however, this occurs 
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only when TEs are transcriptionally active and produce 21-22 nt 
siRNAs. 

Since AGOl function is associated with gene regulation, we 
wondered if AGOl could differentiate between an incorporated 
gene-regulating microRNA/tasiRNA and a TE-derived siRNA 
of the same size produced from a TE mRNA. Because TE 21-22 
nt siRNA production is amplified by the activity of RDR6,'*''^ 
the potential exists for high levels of TE 21-22 nt siRNAs to 
be incorporated into AGOl. AGOl could utilize TE-derived 
21-22 nt siRNAs in a manner similar to a tasiRNA for the regu- 
lation of genie mRNAs that have full or partial complementar- 
ity. However, this regulation of genes by TE siRNAs would only 
occur when TEs are epigenetically active, providing an additional 
layer of influence of TE epigenetic regulation on genes. TEs are 
known to have regulatory effects on genes; however, these effects 
are only understood for the genes that flank TEs (by read-through 
transcription, for example) or are at least linked to the gene and 
represent a cis effect of TE regulation on gene expression. Some 
of these TE-induced cis effects can occur at large distances, par- 
ticularly in mammalian genomes, where TEs have been shown to 
influence promoter and enhancer regions over 100 kb away from 
a gene's protein coding region."*'''' Alternatively, through our 
proposed tasiRNA-like mechanism, non-TE neighboring genes 
could be indirectly regulated in trans by an unlinked TE dur- 
ing conditions of stress or other diverse environmental influences 
that are known to affect the epigenetic regulation of TEs.^"'^' 

We recently demonstrated that one particular TE siRNA, 
termed siRNA854, is derived in its 21-22 nt form from the 
Arabidopsis AthilaGA family of centromeric gypsy LTR retrotrans- 
posons only when TEs are globally epigenetically activated and 
transcribed by Pol II (in particular Arabidopsis mutants, stressed 
long-term cell cultures, or in the chromatin-decondensed pollen 
vegetative nucleus)."' In contrast to the 24 nt siRNAs TEs pro- 
duce when silenced, the 21-22 nt versions of siRNA854 are pro- 
duced upon transcriptional activation and are incorporated into 
AGOl. We have shown that the TE-derived siRNA854 inhibits 
translation of the UBPlb genie mRNA, which contains partial 
complementarity to siRNA854 in its 3' UTR. Thus UBPlb pro- 
tein accumulation is regulated by the epigenetic control of a TE, 
and the T'EAthila6A can regulate a gene in trans through the pro- 
duction of a gene-regulating siRNA. In addition to Arabidopsis, 
a Drosophila TE piwi-interacting RNA (piRNA) has also been 
recently shown to regulate a genie mRNA."^ Although these stud- 
ies have demonstrated the proof of principle that TE small RNAs 
can regulate genes in trans, a genome-wide analysis of this regu- 
lation has not been performed, and many critical unanswered 
questions about this type of gene regulation remain.^' 

We performed a genome-wide small RNA and gene expres- 
sion analysis in a mutant background with the maximum pos- 
sible TE transcriptional activation in order to best observe the 
effect of active TEs on gene regulation. We aimed to answer three 
critical questions. First, which genes are regulated by TE siR- 
NAs? Second, how many genes are regulated by this mechanism? 
Third, is this regulation evolutionarily random/stochastic, or do 
TEs carry sequences to specifically target particular mRNAs? 
We have informatically identified and experimentally confirmed 



genie targets of TE siRNA regulation, and are now able to esti- 
mate how many genes are affected by this mechanism. We have 
also concluded that for at least one TE, the siRNA it produces 
and gene it regulates provides a clear benefit to the TE, provid- 
ing direct evidence that TEs specifically encode siRNAs to target 
particular mRNAs and act as suppressors of host repression. 

Results 

When TEs are transcriptionally activated, AGOl incorporates 
high levels of Athila TE siRNAs. To determine which genie 
mRNAs are regulated by TE siRNAs on a genome-wide level, 
we began by immunoprecipitating the AGOl protein (AGOl-IP) 
from wild-type Columbia (wt Col) and ddml mutant inflores- 
cences. We used an AGOl-specific antibody that has previously 
been shown to interact with AGOl and no other Argonaute 
family proteins.'*'^'' The DDMl gene encodes a swi2/snf2 fam- 
ily chromatin remodeling protein that uses ATP hydrolysis to 
compact histones and is responsible for the formation of hetero- 
chromatin.^^'^'' In homozygous recessive ddml mutant plants, het- 
erochromatic marks such as CG methylation and histone 3 lysine 
9 di-methylation are lost from the TE portion of the genome, 
resulting in global TE transcriptional activation. ^'''^^ As the reces- 
sive ddml mutation is homozygous for increasing generations, TE 
mRNA expression and siRNA levels increase,"" while these plants 
accumulate developmental phenotypes and become less fertile.^' 
We used ddml plants that were homozygous for six generations 
{ddml F6), which accumulate high levels of TE transcripts and 
siRNAs""'^" in order to best observe any TE siRNA regulation of 
genie mRNAs. We performed three biological replicates (bioreps) 
of the AGOl-IP from each genotype (wt Col and ddml F6), and 
isolated and deep sequenced the AGOl-bound small RNAs. We 
obtained a minimum of 1,795,708 genome-matched and filtered 
reads for each biorep (see Materials and Methods). To ensure that 
the AGOl protein is not incorporating small RNAs after the cell 
is ruptured, we spiked lysed inflorescence cell extracts with puri- 
fied small RNAs from seedlings and subsequently performed an 
AGOl-IP and northern blot (Fig. SI). This experiment demon- 
strated that incorporation of a seedling-specific microRNA into 
AGOl does not occur in vitro and, therefore, our AGOl-IP deep 
sequencing data represents AGOl binding in vivo and not after 
cell lysis. 

We first pooled the AGOl-IP bioreps for each genotype and 
compared each small RNA library to a total small RNA (non-IP) 
library made from the same genotypes and tissue." We deter- 
mined that, compared to the total small RNA libraries, 21 nt 
small RNAs are 2.44-fold enriched in the AGOl-IPs (Fig. lA), 
confirming previous findings that AGOl binds primarily 21 nt 
small RNAs.^^^ In the total small RNA libraries, ddml has an 
increase in the number of 21 nt small RNAs compared to wt 
Col (Fig. lA). The increase in 21 nt small RNAs is derived from 
degraded TE mRNAs (Fig. IB), produced from the combined 
activity of DCL2, DCL4, and RDR6."^ In the ddml F6 AGOl-IP 
small RNA dataset, an increase in the 21 nt size class is not appar- 
ent (Fig. lA). However, when we characterized the AGOl-IP 
small RNA library composition, we identified a 4.5-fold increase 
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[557779 reads per million (RPM)] in the amount of TE siRNAs 
present in the ddml F6 AGOl-IP compared to wt Col AGOl-IP 
(Fig. IB). Therefore, we conclude that TE siRNAs are specifi- 
cally enriched in AGOl protein complexes when TEs are tran- 
scriptionally activated and processed into 21-22 nt siRNAs. 

To perform an analysis of the total diversity of small RNA 
sequences within AGOl protein complexes, we individually 
analyzed each biological replicate. First, using the microRNA 
distribution, we found that the similarity and reproducibility 
between individual bioreps and IPs within each genotype was 
very high (Fig. IC). We performed pairwise biorep compari- 
sons of the microRNA accumulation within each genotype and 
found that there was more variability within TE-activated ddml 
F6 mutants, where the lowest R^ value between two bioreps is 
0.98936, while the lowest R^ value between two wt Col bioreps 
was 0.99757. In contrast, the similarity across genotypes was less 
(R^ = 0.9675-0.98261). We used the individual bioreps to deter- 
mine that the increase in TE siRNAs in AGOl protein complexes 
results in a statistically significant {P < 0.0001) 3-fold increase 
in the overall diversity of sequences within AGOl protein com- 
plexes (Fig. ID). Therefore, there is the potential to target more 
mRNAs for regulation in ddml F6 mutants due to the increased 
small RNA sequence diversity within AGOl protein complexes. 

We previously determined that the TE siRNAs that increase 
in abundance in ddml F6 mutant plants are not from all TEs, but 
rather from specific TE families. The three major contributor 
TE families to this increase are the Athila LTR retrotransposons, 
the AtGPl LTR retrotransposons, and the AtENSPM5 family of 
DNA transposons. In addition, we characterized a control TE 
family, AtReplOD, a small non-autonomous Helitron element that 
produces large quantities of 24 nt siRNAs when silenced, but 
does not produce 21 nt siRNAs in ddml mutants (Fig. lE).^' We 
annotated the TE siRNA portion of both the total and AGOl-IP 
small RNA libraries and characterized how many siRNAs these 
TE families contributed. We first annotated the distribution of 
these four TE families using 24 nt siRNAs, which are produced 
when TEs are associated with DNA methylation and are tar- 
gets of RdDM. We find that of the 24 nt TE siRNAs present 
in either wt Col or ddml F6 total small RNA libraries, few are 
incorporated into AGOl (Fig. IE). The incorporation of a small 
amount of TE 24 nt siRNAs into AGOl has been previously 
demonstrated in wt Col.^^'^' We next focused on only the 21 nt 
siRNAs, as these are the size class that are specifically enriched 
in our AGOl-IP (Fig. lA) and are known to regulate genes.''' 
We found that there are very few TE 21 nt siRNAs in wt Col 
total small RNA libraries or in wt Col AGOl-IPs. In ddml F6 
mutants, the amount of 21 nt siRNAs associated with AGOl 
increases 3-fold iox AtGPl, 29-fold ior AtENSPM5, and 133-fold 
for Athila compared to wt Col (Fig. IE). We have previously 
shown that even though nearly all TE families are transcription- 
ally activated in ddml F6 mutants, only 15 TE families generate 
RDR6-dependent 21-22 nt siRNAs when activated,^' and why 
only these TE families generate 21-22 nt siRNAs remains enig- 
matic. Although siRNAs of all three of these TE families increase 
in AGOl complexes when TEs are transcriptionally active, the 
279,476 RPM increase in Athila 21 nt siRNAs (Fig. IE) is the 



major contributor responsible for the increase in total TE siRNAs 
in AGOl-IPs from ddml F6 mutants (Fig. IB). In contrast to the 
24 nt siRNAs, most TE 21 nt siRNAs present in either wt Col 
or ddml F6 seem to be incorporated into AGOl, as Athila 21 nt 
siRNAs are enriched 1.8-fold in ddml F6 AGOl-IPs compared to 
the ddml F6 total small RNA library (Fig. IE). 

We next aimed to determine if AGOl protein and/or tran- 
script levels increased to accommodate the additional 434,171 
RPM TE siRNAs incorporated into AGOl in ddml F6 mutants 
compared to the wt Col AGOl-IP, or if particular small RNAs 
were flooded out of AGOl by these TE siRNAs. We found 
that the levels of the AGOl transcript, the levels of the AGOl- 
targeting microRNAl68, the slicing of the AGOl mRNA by 
microRNAl68, and the AGOl protein levels, all do not increase 
in ddml F6 mutant individuals (Fig. S2). Therefore, we conclude 
that the level of AGOl does not change to accommodate the 
large volume of TE siRNAs incorporated in ddml F6 mutants. 
Instead, we found that TE siRNAs accumulate in AGOl pro- 
tein complexes at the expense of the exclusion of microRNAs 
(Fig. IB). The levels of cDNA- and intron-matching siRNAs, 
as well as non-TE intergenic siRNAs, also decrease in ddml F6 
(Fig. IB); however, these are minor changes relative to the sta- 
tistically significant {P < 0.0001) decrease of microRNA accu- 
mulation in the ddml AGOl-IPs (Fig. IF). To determine which 
microRNAs are decreased in ddml AGOl-IPs, we calculated the 
accumulation for each microRNA in each biorep and performed 
individual statistical tests to determine which microRNAs are 
significantly {P < 0.05) decreased. Figure IF lists the microR- 
NAs significantly decreased in ddml AGOl-IPs. These microR- 
NAs represent many of the most abundant microRNAs in wt 
Col AGO-IPs, and just these 11 microRNAs account for a 2.3- 
fold decrease (305,900 RPM) in ddml AGOl-IPs. We verified 
the increase in Athila TE siRNAs and decrease in microRNAs 
in AGOl-IPs using small RNA northern blots from total and 
AGOl-IP small RNAs (Fig. IG). In Figure IG, the decrease in 
microRNA166 (miRl66) is apparent in both the AGOl-IP and 
total small RNA fractions in ddml F6 mutants. To determine 
if the reduction in microRNAs has an effect on the regulation 
of the microRNA-target mRNAs, we performed qRT-PCR and 
5' cleavage RACE RT-PCR to measure the abundance of un- 
cleaved and cleaved target mRNA. Using microRNAs with low, 
medium, and high levels of abundance in wt Col AGOl-IPs, we 
were unable to detect any change in the un-cleaved steady-state 
transcript levels or microRNA-directed cleavage patterns of the 
microRNA-target mRNAs (Fig. S3). Therefore, we conclude 
that there is still enough microRNA present in AGOl or in other 
AGO complexes in ddml F6 mutants such that the decrease in 
microRNAs in AGOl protein complexes does not disrupt the 
regulation of these target genes. 

Identification of TE siRNAs incorporated into AGOl and 
their predicted genie mRNA targets. To identify which TE siR- 
NAs potentially regulate genes, we filtered the AGOl-IP datasets 
to identify TE siRNAs present in AGOl protein complexes over 
100 RPM. We used the 100 RPM as an arbitrary cut off level, 
as all known inflorescence-expressed and experimentally verified 
functional microRNAs and tasiRNAs accumulate over this level 
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Figure 1A-D. For figure legend, see page 1384. 



1382 



RNA Biology 



Volume 10 Issue 8 




24 nt only 



21 nt only 



«*> 




AthilaG 



miR166a 



miR161 



Total sRNAs 
/ 6'^^ 



21 nt 
17nt 



21 nt 
17nt 



AthilaG 



miR166a 



microRNAs with statistically 
significant changes in accumulation 



MIR 


RPM In 

Col 
AG01-1P 


RPM In 
ddm1 F6 
AG01-IP 


fold 
decrease In 
dldm1 F6 


1 RR 
IDO 








159 


52 009 


33 829 


1.5 


158 


36 493 


23 081 


1 .5 


162 


9519 


6086 


1 .5 


396 


7528 


4014 


1.9 


319 


4334 


2872 


1.5 


403 


3854 


2576 


1.5 


163 


1434 


910 


1.5 


156 


1302 


862 


1.5 


480 


1062 


404 


2.6 


167 


734 


496 


1.5 


total 


544,455 


238,555 


2.3 


change 


305,900 





800,000. 



= 600,000. 
E 



-o 400,000' 



200,000' 



microRNAs 

p<0.01 



-I- 



AG01-IPsRNAs 



Figure 1E-G. Forfigure legend, see page 1384. 



www.landesbioscience.com 



RNA Biology 



1383 



Figure 1A-G (See previous pages). Analysis of AG01 immunoprecipitated small RNA deep sequencing. (A) The accumulation of 21, 22, and 24 

nucleotide (nt) small RNAs. Both total small RNAs and AGOI-immunoprecipitated (AG01-IP) small RNAs were sequenced from wild-type (wt) Col plants 
with epigenetically silenced transposable elements (TEs) and ddml F6 mutants that have transcriptionally activated TEs. Library size is normalized by 
calculating reads per million (RPM) of 1 8-28 nt perfectly genome-matched small RNAs. (B) Categorization of small RNAs by genome annotation fea- 
ture. In the AG01-IP libraries, ddml F6 has a decrease in microRNAs and an increase in TE-derived siRNAs compared to wt Col. (C) Analysis of reproduc- 
ibility of AG01-IP small RNA deep sequencing biological replicates by microRNA Revalues. Pair-wise comparisons were done both between biological 
replicates (green squares) of the same genotype, and between genotypes (yellow squares). (D) Analysis of sequence diversity in AG01-IP small RNA 
libraries. The diversity of individual sequences that accumulate over 10 RPM (left) or 100 RPM (right) in the ddml F6 AGOI-IPs increases 3-fold. (E) Anal- 
ysis of TE family contribution to the small RNA libraries. The three TE families that accumulate the most 21-22 nt siRNAs in ddml mutants are shown: 
Athila, AtGPl, and AtENSPMS. AtReplOD is shown as a control for a TE family that produces only 24 nt siRNAs. SiRNA accumulation is shown for both the 
total and AG01-IP small RNA libraries, and split into 24 and 21 nt size fractions. (F) A decrease in AG01 microRNA accumulation occurs in ddml F6 small 
RNA libraries compared to wt Col. Not all microRNAs experience a decrease in accumulation; those that statistically significantly decrease (P < 0.05) are 
listed. (G) Small RNA northern analysis of total small RNA (left) and AG01-IP small RNA (right). The increase in /\f/)/7a-derived small RNAs is evident in 
both the total small RNAs and AG01-IP; whereas the decrease in microRNA166 (miR166) is also evident in both contexts. 



in our wt Col AGOl-IP dataset (the average microRNA accumu- 
lates to 12 521 RPM, and tasiRNA TAS1C-D4 to 2604 RPM). 
SiRNAs that accumulate below this level likely would not alter 
the expression of their target mRNAs at a detectable level. In 
AGOI-IPs from wt Col, only five 21-22 nt TE small RNAs accu- 
mulate to this level, and are from the Vandal4 (a combined 314 
RPM), Athila4 (242 RPM), AtGP3 (228 RPM), RathE2_cons 
(166 RPM), and HelitronYlB (147 RPM) TE families (Table SI). 
These TE small RNAs have the potential to regulate a few target 
genie mRNAs (see below); however, in the ddml F6 mutant, 437 
TE siRNAs accumulate to at least 100 RPM in AGOl protein 
complexes (Fig. 2 A; Table SI). Therefore, the gene regulatory 
potential increases due to TE siRNA accumulation when TEs 
are transcriptionally active. Of the 437 21-22 nt TE siRNAs, 
80.5% are 21 nt and 19.5% are 22 nt (Fig. 2B). In addition, 
98.6% of the 437 siRNAs are derived from the Athila family 
of TEs, mostly from the Athila2 and AthilaGA families, as the 
Athila siRNAs account for 184,005 RPM (18.4%) of all of the 
ddml F6 AGOl-IP siRNAs (Fig. 2C). These siRNAs are gen- 
erated from the 3' non-protein coding region of Athila, as well 
as the ancient and degraded envelope protein-coding domain 
(Fig. 2D). Interestingly, portions of this 3' non-protein cod- 
ing region of Athila elements display high nucleotide identity 
between Athila subfamily member elements (Fig. S4), suggest- 
ing this region is under positive selective pressure (see Discussion 
section). The average accumulation of an Athila 1\-T1 nt siRNA 
from Table SI is 427 RPM, but they can range over 10 000 RPM 
for an individual 21 nt siRNA. Compared to wt Col AGOI-IPs, 
the average fold-increase of the ddml F6 437 TE siRNAs is 989, 
with some 21 nt siRNAs having as high as a 5000-fold increase, 
demonstrating that transcriptional activation of TEs results in 
particular TE siRNAs incorporating into AGOl to high levels. 

To determine which genes are potential targets of the five 
TE small RNAs that accumulate to over 100 RPM in wt Col, 
and the 437 TE siRNAs from ddml F6 AGOI-IPs, we used the 
psRNA microRNA target prediction software (see Materials and 
Methods).^'' We identified 65 genie mRNAs as predicted targets 
of the five TE small RNAs that accumulate in wt Col AGOI-IPs, 
and 3102 predicted genie mRNA targets of the 437 TE siRNAs 
in ddml F6 AGOI-IPs (Fig. 2E; Table S2). We did not continue 
our analysis of the 65 genes potentially regulated by the five TE 
small RNAs that accumulate in wt Col, as we do not currently 
understand the biogenesis of these small RNAs (whether they are 



TE-encoded microRNAs), and therefore experimental validation 
would be difficult. We focused instead on the 3102 mRNA tar- 
gets of siRNAs produced by transcriptionally active Athila TEs, 
as the regulation of these target mRNAs is under TE epigenetic 
control. Therefore, Athila is a suppressed repository of potential 
gene regulation by siRNAs, with a 47.7-fold increase in poten- 
tial TE siRNA targets when epigenetically activated in ddml F6 
mutants compared to wt Col. The predicted genie mRNA tar- 
gets, the TE siRNA that is predicted to target the mRNA, and 
the E-value score of the siRNA/mRNA target match are shown 
in Table S2. Although we identified 3102 predicted genie mRNA 
targets of TE siRNAs in ddml F6 mutants, 826 of these targets 
are predicted to be inhibited on the translational level and, there- 
fore, their mRNA accumulation levels are not expected to change 
in ddml F6 mutants (Table S2). We focused on the other 2276 
genes, which are predicted to be cleaved by the TE siRNA, gener- 
ating sliced transcripts without a 5' cap that are subject to mRNA 
degradation and can be assayed by mRNA expression analysis. 

Global gene expression analysis identifies putative genie 
mRNAs targeted by TE siRNAs. We next used global expres- 
sion analysis to determine which genes potentially targeted by 
TE siRNAs (from Fig. 2E; Table S2) show decreased mRNA 
expression in ddml F6 mutants. We performed three bioreps 
of an Affymetrix ATHl gene expression microarray of mRNA 
from both wt Col and ddml F6 inflorescences, the same geno- 
types and tissues used in the small RNA-seq experiments from 
Figure 1. We performed pair-wise comparisons of probe set 
expression values between normalized gene expression values for 
each biorep and found high degrees of similarity between bio- 
reps of the same genotype (Fig. 3A). As a control for probe sets 
with increased expression in ddml F6 plants, we analyzed the 
1155 TE probe sets on the array (red points in Fig. 36)^' and 
found 159 (13.8%) have a statistically significant (P < 0.05) > 
2-fold increase in expression. The TEs present on the array are 
not representative of the global TE distribution, as repetitive 
genome content was selected against when designing the array 
probes. We next identified 403 genes that decrease (> 2.0-fold) 
to a statistically significant level (P < 0.05) in ddml F6 mutants 
compared to wt Col (blue points in Fig. 3B; Table S3). Although 
these genes have decreased expression in ddml F6 mutants, 
the mechanism of their regulation may differ. Gene expres- 
sion is influenced by the epigenetic regulation of neighboring 
TEs in both plant and animals (reviewed in refs. 15 and 36). 
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Figure 2. Analysis of theTE siRNA content in AGOI-IPs. (A) The total number of TE siRNAs that accumulate over 100 RPM in AGOI-IPs from wt Col and 
ddml F6. (B) The relative siRNA size distribution of the siRNAs from (A). (C) The TE family characterization of the TE siRNAs from (A). Nearly all of the 
TE siRNAs that accumulate in AG01 over 1 00 RPM in ddml F6 mutants are from the Athila family of LTR retrotransposons. (D) Athila 21-22 nt siRNA 
production comes primarily from the ancient envelope (env) protein coding region and the 3' non-protein coding region of the/\f/)/7a2 and Athila6A TE 
families. Twenty-one nt siRNAs are shown in red, while 22 nt siRNAs are depicted in green. (E) The number of predicted target genie mRNAs using the 
TE 21-22 nt siRNAs sequenced in AGOI-IPs from (A). See manuscript for details on how the prediction of genie target mRNAs was performed. 



Transcripts originating in TEs may generate alternative genie 
transcriptional start sites, express aberrant transcripts that result 
in the silencing of the gene, or the TE may act as a cw-regulatory 
element altering the regulation of the gene's promoter (reviewed 
in ref 37). To remove the genes that have decreased expression 
in ddml F6 mutants, but may be regulated by a neighboring TE 
(and, thus, not regulated by a TE siRNA in trans), we deter- 
mined if each gene identified from our expression analysis con- 
tained a TE or TE fragment within its gene space. We defined 
the gene space as the exons, introns, untranslated regions, and 
1 kb upstream and downstream of the transcriptional start and 
stop sites. We used the 1 kb arbitrary criteria for surrounding 
gene space because we calculated the median intergenic space of 
the gene-rich chromosome arms as 690 bp. Although individual 
examples of TEs acting on neighboring genes at large distances 
through interaction with distal enhancers are known,"*'" these 
few examples come from large mammalian genomes with greater 
intergenic distances, and expanding our definition of gene space 
in Arabidopsis would result in reduction of the number of test- 
able genes. One hundred and five (26.1%) of the 403 genes that 
displayed decreased expression in ddml F6 plants contained a 
TE or TE fragment in their gene space (Table S3). The remain- 
ing 298 genes are not subject to regulation by a neighboring TE 
(at least within 1 kb) , and are therefore candidates for decreased 
regulation in ddml F6 due to the activity of trans-ncung TE 



siRNAs. These 298 genes represent 1.3% of the genes assayed on 
theATHl microarray. 

To identify global targets of TE siRNAs, we compared the 
2276 mRNA targets predicted to be cleaved from TE siRNAs 
accumulating in the ddml F6 AGOI-IPs (Fig. 2E) to the 298 
genes with decreased expression in ddml F6 (and no TE in the 
gene space). We found 27 genes that overlap in these datasets, 
which is not statistically different from the expected overlap 
between two sets of genes of these sizes. Table S4 lists the gene 
annotation, the targeting TE siRNA, and gene expression change 
in ddml F6 mutants for the 27 gene targets we informatically 
identified. All 27 of our identified genes are potentially targeted 
by 21 nt TE siRNAs, and the average accumulation of these siR- 
NAs in ddml F6 AGOI-IPs is 279 RPM, while the accumulation 
of some TE siRNAs ranges as high as > 800 RPM. We utilized 
this gene list to draw candidate mRNA targets for experimental 
validation. 

Validation of genie mRNA targets of TE siRNAs. We exper- 
imentally validated TE siRNA regulation of a subset of the target 
mRNAs from Table S4. We chose a subset of targets to test with 
a broad range of TE 21 nt siRNA accumulation in AGOl protein 
complexes (123-816 RPM) and a range of predicted strength of 
the complementarity between the TE siRNA and target mRNA 
(psRNA microRNA target prediction E value = 2-4.5). We tested 
these targets in two assays. First, we performed qRT-PCR on the 
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target genes in wt Col, ddml F6, and ddmllrdrS plants. This 
assay will confirm the expression decrease in ddml F6 plants com- 
pared to wt Col detected in the gene expression microarray from 
Figure 3. In addition, all 21-22 nt siRNAs generated from the 
3' region of the Athila TE are dependent on a functional RDR6 
protein for their production."^ Therefore, we predict that any 
mRNA targeted for cleavage by a TE siRNA in ddml F6 plants 
will no longer be targeted in a ddml/rdr6 Ao\ih\& mutant, as TE 
21-22 nt siRNA biogenesis is abolished. The mRNA levels of the 
target should decrease in ddml F6 single mutants, and return 
to near wt Col levels in ddml/rdr6 double mutants. Although 
the comparison between wt Col, ddml F6, and ddml/rdr6 g&ne 



Figure 4 (See previous page). Validation of a subset of genes tar- 
geted by TE siRNAs. (A) qRT-PCR of three predicted target genes from 
Table S4. Their relative expression has been normalized to 1 in wt Col. 
In each, their expression decreases in ddml F6 mutants, and increases 
back to near wt Col levels in ddm1/rdr6 mutants that have TE activation 
but do not produce TE 21-22 nt siRNAs. STTM transgenes in ddml F6 
plants block the interaction between a TE siRNA and its mRNA target 
in a sequence-specific manner, and expression that was decreased in 
ddml F6 plants is increased back to wt Col levels. (B) 5' cleavage RACE 
PCR analysis of the AT4g30850 (HHP2) gene. This gene is specifically 
cleaved by theTE siRNA in ddml F6 mutants, but not when theTE is 
epigenetically silenced (in wt Col) or when TEs are activated but do 
not produce 21-22 nt siRNAs (in ddm1/rdr6 double mutants). /\RF/6 is 
shown as a control for consistent levels of sliced mRNA. The experi- 
mentally determined cleavage site of the HHP2 mRNA (arrow) is shown 
along with the number of times this cleavage site was sequenced and 
the total number of sequences analyzed. (C) Phenotypic analysis of 
pollen abortion in wt Col, ddml F6, ddml/rdr6, and dc/mZ/STTM plants. 
The ddm//STTM plants are the same as in (A) and specifically block the 
interaction between the At2g1 6910 (/A/MS) mRNA and theTE siRNA. 
The box plot "+" is the mean, the box is the 25th and 75th percentile, 
while the whiskers represent the 1 0th and 90th percentile, n, number of 
individual anthers analyzed. (D) qRT-PCR of three target genes that did 
not make the predicted list of targets (Table S4). See (A) for description 
of genotypes and normalization. 



expression levels is a strong indicator of TE siRNA regulation, 
using this assay we could not deftnitively conclude that our indi- 
vidual predicted TE siRNA was specifically regulating the genie 
mRNA. Expression analysis in ddmllrdr6 double mutants can- 
not inform us about sequence specificity, as this double mutant 
approach is not specific to one siRNA/mRNA direct target, and 
gene regulation in a ddml /rdr6 double mutant could be due to an 
indirect effect and not our predicted TE 21 nt siRNA. As a gold 
standard for the regulation of a genie mRNA by a TE siRNA, we 
tested the regulation of some targets by directly interfering with 
the accumulation of specific TE siRNAs using sequence-specific 
short tandem target mimic (STTM) transgenes. STTM trans- 
genes express transcripts that are partially complementary to a 
microRNA or siRNA, sponging and degrading the small RNA, 
and specifically decreasing the interaction between the small 
RNA and the target mRNA in a sequence-dependent manner.'* 
We generated STTM transgenes specific to particular TE siRNA 
sequences and transformed them into ddml F5 mutant plants to 
produce ddml F6 plants with individual STTM transgenes. We 
have confirmed the targeting of the genie mRNAs At2gl6910/ 
AMS, At4g30850/HHP2, and Atlg57790 by the TE siRNAs we 
predicted using STTM transgenes (Fig. 4 A, Table 1). In each of 
these cases, the decreased expression detected on the ddml F6 
microarray was independently confirmed by qRT-PCR, and the 
average decrease in expression for these three genes was 1.91, less 
than the 2.53 average fold decrease detected for these three genes 
detected on the microarray (Fig. 4 A, Table 1). Importantly, the 
decrease in expression detected in ddml F6 plants is dependent 
specifically on the predicted TE siRNA. Therefore, the regula- 
tion of these genie mRNAs is not regulated by TE activity and 
expression alone, but specifically by TE siRNAs produced from a 
separate locus. In addition, for the genie mRNA At4g30850 that 
encodes the HHP2 transmembrane receptor protein, we were 
able to detect the accumulation of a distinct 5' cleavage site in 



Figure 3. Gene expression analysis in ddml F6 mutants. (A) R^ values 
of ATH1 gene expression microarrays between biological replicates of 
the same genotype. Pair-wise comparisons were performed between 
biological replicates of the same genotype. (B) MAS5.0 linear normal- 
ized intensity values for each entity represented on the ATH1 array. The 
TEs present on the array are colored in red, and many of these show 
the predicted increase in expression in ddml F6 mutants. Genes with a 
decrease in expression (> 2-fold decrease, P < 0.05) are shown in blue. 
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Figure 4. For figure legend, see page 1386. 
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ddml F6 single mutants (with TE expression and siRNA produc- 
tion), but not in ddmllrdr6 double mutants (with TE expression 
and no siRNA production) (Fig. 4B), demonstrating that this 
site is cleaved by the predicted TE siRNA. Sequencing of this 5' 
cleavage product identified that all cleavage occurs at one spe- 
cific location within the pairing between the siRNA and mRNA 
(Fig. 4B). The experimentally confirmed genie mRNAs targeted 
by epigenetically regulated TE siRNAs are shown in Table 1. 
The average TE siRNA accumulation level of these experimen- 
tally confirmed regulators of gene expression is 358 RPM, while 
the average qRT-PCR expression decrease of the target genes in 
ddml F6 is 2.77-fold. 

Of the 27 predicted genie mRNAs targeted by TE siRNAs in 
AGOl-IPs, one of these has a well-characterized morphological 
phenotype. The gene Aborted Microspores {AMS) (At2Gl6910) 
is a basic helix-loop-helix transcription factor expressed in the 
immature floral inflorescence and in the developing anther that 
plays a critical role in pollen development. In ams loss-of-func- 
tion mutants, pollen fails to develop and the plants are male ster- 
ile.^' We investigated the pollen abortion phenotype in wt Col 
and ddml F6 plants and showed that ddml F6 has statistically 
increased levels of pollen abortion (Fig. 4C) . In ddml/rdr6d-0\AAc 
mutants, in which TEs are still transcriptionally activated but do 
not produce 21-22 nt Athila siRNAs, pollen abortion decreases 
to near wt Col levels. Additionally, in ddml F6 plants that gener- 
ate TE-derived 21-22 nt siRNAs, and the AMS-targeting siRNA 
sequence is sequestered and/or degraded by a sequence-specific 
STTM transgene transcript, the interaction between the siRNA 
and AMS mRNA is blocked. This results in a reduction of the 
pollen abortion phenotype to near background levels (Fig. 4C). 
These data demonstrate that the regulation of the 50% reduc- 
tion in expression of the AMS gene (Fig. 4A) manifests as a weak 
but apparent phenocopy of the ams loss-of-function mutation 
(Fig. 4C). 

In addition to the predicted genie mRNA targets of TE siR- 
NAs that we filtered using strict criteria (Tables Sl-4) , there were 
additional mRNA targets that we felt were likely regulated by 
TE siRNAs, even though they were filtered out of our analysis. 
We have confirmed that three genie targets, which did not make 
our final list of 27 targets, are also directly targeted by TE siR- 
NAs. These genie mRNAs include AtAgQ796QIAtCLSC12 and 
At5g27600 (both filtered off the final list in Table 54 due to 
having a TE in its gene space) and At4gl76l0 (less than a 2-fold 
change by microarray analysis) (Fig. 4D, Table 1). Therefore, the 
current number of experimentally validated Arabidopsis genie 
mRNAs regulated by epigenetic TE activity and TE 21-22 nt 
siRNAs is seven, six of which are identified in Figure 4 plus 
UPBlb from McCue et al."^ UBPlb did not make our Table 54 
list of predicted targets because the repression of UBPlb in ddml 
inflorescence is at the translational level,"" and the score of the 
strength of pairing between the Athila-A-cnwcA siRNA854 and 
UBPlb relies on non-canonical base pairing'"' that scores below 
the threshold we used. The lack of identification of UBPlb and 
the identification of three targets not on our final list (Fig. 4D) 
demonstrate that our identified 27 targets in Table 54 is an 
underestimate of the number of genie mRNAs regulated by TE 



siRNAs. In addition, for all seven confirmed gene targets of TE 
21 nt siRNAs, we failed to detect any similarity between the 
genie mRNA and TE that extended beyond the short site of TE 
siRNA complementarity (data not shown), demonstrating that 
these target genes do not represent a class of missed TE annota- 
tion, but only share a small amount of identity to a TE, shown 
in Table 1. 

To determine if the genes regulated by TE 21-22 nt siRNAs 
are part of a common pathway or if particular cellular functions 
are specifically targeted, we performed a gene ontology classifica- 
tion on the genes downregulated in ddml F6 mutants, as well as 
our predicted and confirmed lists of target genes. We found pri- 
mary metabolic processes (p = 8.54E-36), particularly carbohy- 
drate and lipid metabolism, and catalytic activity (p = 1.16E-26) 
over-enriched in the list of total genes downregulated in ddml F6 
mutants (Table 53). The enrichment of these gene classifications 
persisted in our list of genes downregulated in ddml F6 without 
TEs in their gene space (Table 53), and the list of 27 predicted 
genes regulated by TE 21-22 nt siRNAs (Table 54). In fact, of 
our final list of seven confirmed targets (six from Table 1 plus 
UPBlb), five are involved in primary metabolic processes, sug- 
gesting this broad biological process is a specific target of regula- 
tion by TE siRNAs when TEs become epigenetically activated. 

Gene regulation via TE siRNAs can act as a suppressor of 
TE silencing. To determine why TE 21 nt siRNAs regulate some 
genes, we focused on the critical unresolved question of whether 
the regulation of genes by TE siRNAs is simply due to random 
partial complementarity between siRNAs and genes, or if this 
regulation imparts some advantage to the host cell or TE that 
may be evolutionarily selected for over time.^^ To address this 
question, we used the Athila6-A&[iYeA siRNA854 and its target 
UPBlb genie mRNA. UBPlb is a protein predicted to play a role 
in the formation of stress granules, sites of translational repres- 
sion of mRNAs.*' The animal UBPlb homolog, TIA-1, acts to 
repress viral translation, and is targeted by viral microRNAs that 
act as suppressors of host viral silencing.''^''" In plants, UBPlb 
protein is located in the nuclei of unstressed cells, and translo- 
cates to punctate cytoplasmic bodies in stress conditions (such 
as dark grown or osmotic stress conditions) or during the loss of 
heterochromatic control and TE activation in ddml mutants."" 
Therefore, we hypothesized that UBPlb, and potentially stress 
granules, may act to translationally inhibit TE activity, and that 
Athila6 has retained the siRNA854 sequence as a mechanism to 
suppress the host UBPlb-based translational silencing of TEs. 

To investigate the role UBPlb plays in sAcncmgAthila transla- 
tion, we first confirmed that during stress conditions, UBPlb is 
indeed localized to stress granules and not P-bodies, which are 
sites of mRNA decay (reviewed in ref 44). We created plants 
with the UBPlb protein fused to GFP and either the DCP2 
protein (a P-body marker) or the G3BP protein (a stress granule 
marker) fused to REP. We transformed these markers into wt 
Col and observed their localization in stressed root meristem cells 
grown without light. We determined that UBPlb-GFP fluores- 
cence overlaps with the G3BP protein, which specifically marks 
the sites of stress granules (Fig. 5A). In contrast, UBPlb fluo- 
rescence did not overlap with the DCP2-RFP marked P-bodies 
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Table 1. Experimentally validated target genes of epigenetically-regulated TE siRNAs 
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(Fig. 5A). This data demonstrates this for the first time in whole 
Arabidopsis organs that the UBPlb protein is indeed a compo- 
nent of stress granules. We next aimed to determine if TE protein 
levels increase in ubplb mutants, which would indicate that the 
wt UBPlb protein acts to repress TE protein levels. We raised a 
polyclonal antibody to the GAG capsid protein of the Athila6 
retrotransposon. We used this antibody for Westerns using pro- 
tein extracted from inflorescence tissue in a panel of single, dou- 
ble and triple mutants with various combinations of the ddml, 
rdr6, and upblb mutations. A representative Western of three 
independent experimental and biological replicates is shown in 
Figure 5B, with the levels and variation of GAG accumulation 
normalized to Actin across these biological replicates shown in 
Figure 5C. In wt Col or rdr6 single mutant plants, Athila6 is 
transcriptionally silenced'^ and the level of GAG protein is virtu- 
ally undetectable. In ddml F2 or F6 single mutants, Athila6 is 
transcriptionally activated, and the levels of GAG protein increase 
4-fold. We next investigated ubplb mutants, which are not in the 
reference Col background as are the ddml and ri^rd^ mutations, 
but rather in a WS background."" We tested this upblb mutant in 
WS, as well after introgressing the ubplb mutation into Col over 
six generations. Some individual bioreps of any genotype with a 
WS background show low levels of GAG protein accumulation 
(Fig. 5B), but over the course of the three biological replicates the 
level of GAG protein does not show significant changes in wt WS 
or upblb single mutants (in Col or WS backgrounds) compared 
to wt Col (Fig. 5C). In rdr6/ubplb double mutants (which have a 
mixed Col/WS background), Athila6TEs are still transcription- 
ally silenced, and GAG protein does not accumulate. In ddml/ 
ubplb mutants, Athila6 elements are transcriptionally active (due 
to the ddml mutation), but GAG protein levels decrease com- 
pared to ddml single mutants, and we are unable to conclude 
if this decrease is due to the upblb mutation or the mixed Col/ 
WS background. We generated two types of ddml/rdr6 double 
mutants in which TEs are transcriptionally active but these tran- 
scripts are not degraded by RNAi via RDR6. One is in a pure 
Col background, and one is in a Col/WS mixed background. We 
found that the background does play a role in the accumulation of 
GAG protein, as the mixed background plant accumulates higher 



GAG protein levels compared to the pure Col background. Most 
importantly, in each of the three biological replicates, the ddml/ 
rdr6/ upblb triple mutant displayed the highest level of GAG 
protein accumulation (Fig. 5B and C). We observed a 2-fold 
increase in GAG protein accumulation compared to the mixed- 
background ddml/rdr6 do\ih\c mutant {P < 0.05). These data 
demonstrate that the UBPlb stress granule protein plays a role 
in repressing AthilaG GAG protein production when the TE is 
transcriptionally activated (in a ddml mutant) and RNAi is abol- 
ished (in the rdr6 mmam). Therefore, UBPlb serves as a third 
line of defense in the cell against the activity of TEs, and is only 
required once transcriptional and post-transcriptional silencing 
have failed. We thus conclude that the inhibition of UPBlb pro- 
tein production by the Athila-Asnveil 21-22 nt siRNA854 acts 
as a suppressor of a host TE silencing mechanism, demonstrating 
a derived function for an individual epigenetically regulated TE 
siRNA. 

Discussion 

TEs are appreciated as regulators of gene expression in cases 
where a TE or TE fragment neighbors the gene being regulated 
(reviewed in ref 15). This type of regulation can occur due to a 
TE being in close proximity to a gene's protein coding region, or 
at a larger distance if the TE influences a CM-regulatory element 
such as an enhancer element. Recently, two individual examples 
demonstrated that a TE can regulate a gene through a small RNA 
in trans from an unlinked position, without the gene containing 
any annotated TE fragment or homology to the TE outside of 
partial complementarity over the length of the small RNA.'*'^^ 
Here, we have demonstrated that gene regulation by TE siR- 
NAs is far more pervasive than the previous two examples. We 
have gone from two experimentally validated targets of TE small 
RNAs {nanos from Drosophila and UPBlb from Arabidopsis) to 
seven in Arabidopsis (Table 1), with an additional 24 likely tar- 
gets informatically predicted by our analysis (Table 54). We are 
now able to estimate the total number of genes targeted by TE 
siRNAs, and we estimate that this number lies between 20-300 
(see below). In addition, we have demonstrated that one genie 
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Figure 5 (See previous page). UBPIb Is a stress granule protein that acts to repress /\f/)/7o protein production. (A) Sub-cellular GFP localization of the 
UPB1 b protein in dark-grown stressed root meristem cells. Plants also contained a transgene with RFP fused to either the DCP2 P-body protein, or the 
G3BP stress granule marker. (B) Western analysis of Athila6 GAG protein levels in double and triple mutant plants. Actin is shown as a protein loading 
control. The highest level of GAG protein is detected in clclml/rclr6/ubplbtnp\e mutant plants. (C) Statistical analysis of GAG protein accumulation in 
three independent westerns as in (B). For each of the three analyses, GAG protein is highest in the ddml/rdr6/ubp1b triple mutant plants. GAG protein 
levels in each western and for each sample were normalized to the level of Actin protein accumulation. 
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target of TE siRNAs plays a direct role in repressing TE activ- 
ity, and this data provides evidence that these TE siRNA-target 
mRNA interactions provide the TE with a replicative advantage 
that may be evolutionarily selected for. In this case, the siRNA 
(siRNA854) comes from a non-protein coding region of the 
Athila6 retrotransposon, and acts as a suppressor of a host mecha- 
nism to repress TE activity. 

Number of target genes. We have experimentally validated 
seven genes (0.03% of the tested Arabidopsis genes) that are reg- 
ulated by TE siRNAs. Although this is a low number, it is signifi- 
cant, as we have demonstrated a methodology that can be used to 
identify additional genes regulated by TE siRNAs, and this num- 
ber serves as a lower bound for the number of genes regulated 
by TE siRNAs. In addition to these seven confirmed targets, an 
additional 24 targets are excellent bioinformatic candidates based 
on their complementarity to a TE siRNA, their decreased expres- 
sion in ddml F6 mutants, and their lack of a neighboring TE 
(Table S4). Overall, we conservatively estimate that at least 20 
genes are targeted by TE siRNAs, and this number may easily be 
more than 10-fold higher. Our upper estimate of the number of 
genes regulated by TE 21-22 nt siRNAs in Arabidopsis is 300, 
as this is the number of genes with decreased expression in ddml 
F6 plants without a neighboring TE. This is likely an overestima- 
tion, as not all of these genes will be regulated by this one mech- 
anism. For example, some of these genes may be regulated by 
neighboring TEs that lie over 1 kb from the gene's start and stop 
sites that we used as a cutoff in our analysis. This overestimation 
may account for why the overlap is not greater between the genes 
with decreased expression in ddml F6 mutants and the predicted 
targets of AGOl-incorporated ddml TE siRNAs. 

The upper bound estimate of Arabidopsis genes regulated 
by TE 21-22 nt siRNAs is difficult to estimate for the follow- 
ing reasons. First, we have used stringent arbitrary cutoffs, and, 
therefore, many TE siRNAs and genie mRNAs were not consid- 
ered. For example, we both limited our scope to the TE-derived 
siRNAs that accumulated to at least 100 RPM and to only the 
403 genes that we have experimentally identified as having at 
least a 2-fold decrease (and P < 0.05) in expression in ddml F6 
plants. Although limiting our scope to those genes with at least 
a 2-fold decrease makes verification of these targets easier, it 
likely does not take into consideration any TE-regulated genes 
that experience a more subtle change in expression, underesti- 
mating the total number of genes that experience regulation by 
TE siRNAs. Second, like UBPlb, many of the genes targeted 
by TE siRNAs may not show decreased mRNA accumulation, 
as both plant and animal siRNAs can inhibit protein produc- 
tion on the translational level.' Using only mRNA expression 
data, the number of genes regulated by translational repression 
of TE siRNAs is difficult to gauge, and since translational inhi- 
bition cannot be assayed on a high-throughput genome-wide 
level, translational repression is overlooked in our analysis. For 
example, we predicted 826 genes to be targeted on the transla- 
tional level by TE 21-22 nt siRNAs that are incorporated into 
AGOl (Table S2), but these predicted targets were not inves- 
tigated further. Third, we investigated only one tissue at one 
developmental stage, and additional regulation will likely take 



place as the genie transcriptome changes from tissue to tissue. 
Lastly, we have chosen analysis with STTM transgenes as the 
gold standard to validate TE siRNA/genic mRNA targeting 
(Fig. 4). This method is scientifically accurate, but laborious 
and, therefore, has limited the scope and number of target genes 
we have experimentally validated. 

Decrease in target gene expression. The level of decrease 
of the genes regulated by TE siRNAs is relatively minor com- 
pared to either knockout mutations or targeting by a microRNA. 
Knockout mutations occur on the genetic level and affect all cop- 
ies of the potential transcript, and gene regulation by microRNAs 
stems from thousands of RPM of the microRNA accumulating 
in AGOl-IPs (Fig. IF). We observe that a range of 126-816 
RPM TE siRNA accumulation is sufficient for regulation of their 
respective target genes (Table 1). However, this is a 15-100-fold 
decrease in levels of the TE siRNAs accumulating in AGOl com- 
pared to the accumulation of the average microRNA. Therefore, 
the regulation by TE siRNAs is likely more of a 2 -5 -fold modu- 
lation of mRNA levels rather than a destruction of all mRNA 
and absolute lack of protein production. However, in the case of 
the target gene AMS, enough gene expression change is gener- 
ated by the TE siRNAs to see a visible though subtle phenotype 
(Fig. 4C). 

Types of targeted genes. Two observations stem from the 
identities of the genes regulated by TE siRNAs. First, we dem- 
onstrated that one target mRNA, UPBlb, encodes a protein that 
specifically acts as a host cell defense mechanism to inhibit TE 
activity (Fig. 5). This suggests that one set of TE siRNA target 
genes likely includes the host proteins responsible for TE surveil- 
lance and repression, and that the ability to regulate a gene via a 
TE siRNA has evolved from the evolutionary arms race between 
the TE and host genome. TE siRNAs may represent suppressors 
of host silencing and repression systems, allowing the TE to gain 
activity when some of its TE transcripts are degraded by the host 
small RNA defense system. The second observation is that the 
gene ontology category of "primary metabolic processes" is statis- 
tically enriched for the genes regulated by TE siRNAs. However, 
the purpose of these common-function targets is not clear. 

TEs that participate in siRNA-mediated gene regulation. 
In ddml F6 mutants, nearly all TEs are transcriptionally reac- 
tivated. Of these, we previously identified 15 TE sub-families 
that are targeted by endogenous RNAi and produce 21-22 nt 
siRNAs.^' Here, we have identified only five TE families whose 
TE siRNAs are incorporated into AGOl {Athila, AtGPl, 
AtENSPM, Vandal, and Helitron). Of these, we find that only 
two families {Athila and AtENSPM) are able to regulate gene 
expression. Overall, we find no correlation between those TEs 
that produce 21-22 nt siRNAs and genome position, copy num- 
ber, or expression level of reactivated TEs in ddml?^ After pro- 
duction of 21-22 nt siRNAs, why some are incorporated into 
AGOl and others are not is also currently unknown. We cannot 
determine if the most abundantly produced TE siRNAs are those 
that get incorporated into AGOl, because AGOl protects these 
siRNAs from degradation, resulting in their high accumulation 
levels. However, we believe that the siRNAs that accumulate to 
the highest levels in AGOl are most likely to regulate genes. We 
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also must consider that the identification of only two TE families 
that can regulate genes may be a bias generated from the 100 
RPM cutoff of TE siRNAs in AGOl we used. Although other 
TE families do not produce many TE siRNAs that are incorpo- 
rated into AGOl, we cannot definitively say that they have no 
regulatory effect on the genie transcriptome. 

Of particular interest is the ability of the 3' region of Athila 
LTR retrotransposons to produce large amounts of TE 21-22 nt 
siRNAs (Fig. 2) and regulate multiple genes (Table 1; Table S4). 
The Athila 3' region of siRNA production overlaps with the 
ancient envelope protein coding domain and the non-protein cod- 
ing 1.6 kb region adjacent to the 3' LTR (Fig. 2D). Therefore, 
this well-conserved region of Athila acts as a suppressed reposi- 
tory for gene regulatory siRNAs and has the potential to regu- 
late hundreds of genes (Fig. 2E; Fig. S4, Table S2). Although 
the RDR6-dependent siRNA processing of Athila is a cellular 
response to inhibit the element's proliferation, we speculate that 
Athila TEs may have retained this 3' non-protein coding region 
to regulate a diverse set of genes as a secondary function of these 
siRNAs, similar to how tasiRNA-producing loci regulate suites 
of target genes. In addition, TE 21-22 nt siRNA generating 
loci, such as the 3' end of Athila, may themselves be conserved 
epigenetically regulated tasiRNA loci, and possibly represent 
the evolutionary origin of the other developmentally regulated 
tasiRNA-generating loci. 

Our inability to detect similarity between the gene target and 
TE outside of the siRNA binding site suggests that specific genes 
are not targeted because of an unannotated TE fragment inserted 
within their gene space. Instead, we favor a stochastic model in 
which TE siRNAs are produced, and regulate somewhat random 
targets. If this targeting is beneficial to either the TE or host, this 
targeting will be selected and maintained."' Another possibility 
is that short gene sequences are incorporated into TEs, a process 
referred to as ir/zw.f-duplication.''' Since different organisms, and 
perhaps different strains of the same organism, have vastly dif- 
ferent complements of TEs, the possibility exists that the genes 
regulated by TE siRNAs differ widely and contribute to the gen- 
eration of species- or strain-specific variability. 

TE siRNA accumulation in AGOl. When TE 21-22 nt siR- 
NAs are incorporated into the AGOl protein in ddml mutants, 
the level of AGOl protein does not increase; rather, the accumu- 
lation of specific microRNAs is reduced to accommodate the new 
TE siRNAs (Fig. 1; Fig. S2). We do not believe that the AGOl 
protein is selectively binding TE siRNAs, but instead, the high 
level of TE 21-22 nt siRNAs produced when TEs are transcrip- 
tionally activated in ddml mutants results in a flooding of AGOl 
with TE siRNAs due to their sheer abundance at the expense of 
microRNAs. Why specific microRNAs experience reduced levels 
in AGOl when TEs are active is currently unknown. We observe 
that the microRNAs that experience changes in AGOl incor- 
poration do so without consequence to the regulation of their 
target mRNAs (Fig. S3), possibly due to an excess of microRNA 
accumulation in AGOl when TEs are epigenetically silenced. 
We speculate that there is a feedback mechanism that buffers the 
levels of particular microRNAs in AGOl against drastic changes 



due to the critical mass of microRNA required for proper gene 
regulation, particularly for low-abundance microRNAs. 

Many different TE siRNAs are produced in ddml plants; how- 
ever, only some of them are incorporated to high levels in AGOl 
protein complexes (Fig. 1; Table SI). Individual TE siRNAs are 
not as selectively enriched in AGOl protein complexes (relative 
to the total TE small RNA pool) compared to the enrichment of 
individual microRNAs. The primary function of these TE siR- 
NAs is likely the post-transcriptional degradation of TE mRNAs. 
However, we have found that particular TE siRNAs can regulate 
genie mRNAs due to their partial complementarity and incorpo- 
ration into AGOl. Why particular TE siRNAs are incorporated 
into AGOl is likely due to their specific size generated by DCL4 
and DCL2 (21 and 22 nt, respectively), their 5' nucleotide,'^ and 
their subcellular proximity to AGOl protein. 

Environmental and species-specific modulation of gene 
expression via TE siRNAs. We chose the ddml mutant for our 
analysis because it has a very high level of global TE transcrip- 
tional activity,^** and it produces very high levels of TE 21-22 nt 
siRNAs."" However, gene regulation in this mutant context was 
a proof-of-principle to determine how many and which genes 
are regulated in this manner. Now that we have rough answers 
to the questions of the number of genes regulated, the identity 
of those genes, and if TEs can purposely encode a siRNA to 
regulate a gene, our attention turns to biologically relevant cases 
of TE activation to determine the role of TE regulation of non- 
neighboring genes. First, in contrast to the wt Col Arabidopsis 
genome, other plants and animals have genomes with naturally 
active TEs and accumulate TE siRNAs as a product of the post- 
transcriptional host defense against TE activity. It will be 
interesting to determine if and how genomes with naturally 
active TEs deal with constant gene regulation produced from 
ever-present gene-regulating TE small RNAs. Second, environ- 
mental stresses such as large swings in temperature are known 
to epigenetically activate TEs,^°'^' and we hypothesize that this 
transient TE activation will lead to transient gene regulation, 
potentially contributing an epigenetic effect on an organism's 
response to stress. 

Materials and Methods 

Plant material. The mutant alleles used in this study are described 
in Table S5. Plants were grown under standard long day condi- 
tions at 22°C. Inflorescence tissue was used in each experiment 
unless otherwise noted. 

AGOl immunopurification. The AGOl protein was immu- 
noprecipitated as in McCue et al."" For the AGOl western blot, 
protein was obtained by grinding inflorescence tissue with liquid 
nitrogen and homogenizing in 2 mL extraction buffer [100 mM 
Tris-HCl pH 7.5, 150 mM NaCl, 1 mM EDTA, and 5 mM DTT 
containing one tablet/10 mL protease inhibitor cocktail (Roche)] 
per gram of tissue. Protein was run on an SDS-PAGE gel, trans- 
ferred by Trans Blot SD Semi-Dry transfer system (BioRad), and 
immunoblotted with a-AGOl and a-ACTll (loading control) 
antibodies (Agrisera AB). 
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Small RNA deep sequencing and analysis. Libraries were 
produced using the TruSeq Small RNA Sample Preparation Kit 
(lllumina). Each library was barcoded and sequenced in one lane 
of an lllumina Genome Analyzer Ilx. The resulting sequences 
were de-multiplexed, adapter trimmed, and filtered on length 
and quality, and tRNA/rRNA and low complexity reads were 
removed. Small RNAs were matched to the Arabidopsis genome, 
and sequences that did not perfectly align were discarded. Library 
size was normalized by calculating reads per million of 18-28 nt 
genome-matched small RNAs. Small RNAs were also matched to 
the TAIRIO (www.arabidopsis.org) and Repbase (www.girinst. 
org) annotation of the TE portion of the Arabidopsis genome 
using bowtie. To best handle multi-mapping sequences generated 
from repetitive regions of the genome, the bowtie modifiers "-best 
-Ml -strata" were employed.'" If more than one genome perfect 
match for a TE siRNA exists, only one random match is assigned 
per small RNA read. The raw sequencing and genome-matched 
small RNAs analyzed are available from NCBl GEO repository 
number GSE45990. Small RNA tracks and display of the data 
in Figure 2 were performed using the SiLoMa small RNA locus 
mapper of the UEA small RNA toolkit.^" Predictions of small 
RNA target genes were made using the psRNA-Target analysis 
tool.'^ We used a psRNA-Target mRNA/small RNA pair scoring 
cutoff of E < 5 points, which allows for some G:U wobble pair- 
ings (0.5 points each), insertions/deletions (2.0 points each), and 
non-canonical Watson-Crick pairings (1 point each), with 0.5 
points added if the mismatch occurs in the small RNA 5' seed 
region." 

Small RNA northern analysis. Total RNA was isolated 
using TRlzol reagent (Life Technologies), and small RNAs were 
concentrated with polyethylene glycol. Twenty-eight micro- 
grams of small RNA-enriched RNA were loaded in each lane. 
Immunoprecipitated small RNA northern analysis was performed 
by directly isolating RNA from immunoprecipitations by TRlzol 
(Life Technologies) with no further small RNA size enrichment. 
Eight |JLg of RNA was loaded in each lane and the MicroRNA 
Marker (New England Biolabs) was used for size comparison. 
Gel electrophoresis, blotting, and cross-linking were performed 
as described in Pall et al.'^ Athila probes were generated by ran- 
domly degrading a P32 -labeled in vitro RNA transcript. PGR 
primers used to generate the probes are listed in Table S5. 

Microarray and analysis. Wild-type Gol and ddml F6 plants 
were grown side-by-side, and three non-overlapping pools of 
inflorescence tissue were collected. RNA isolation was per- 
formed using TRlzol reagent (Life Technologies) followed by 
RNeasy MinElute Gleanup Kit (Qiagen). Oligo-dT primed 
cDNA was labeled and hybridized to the Affymetrix ATHl- 
121501 gene expression microarray. Three biological replicates 
were performed for each genotype. Analysis of the microarray 
data was performed using the GeneSpring software suite (Agilent 
Technologies). Relative expression levels for the genes selected in 
Table S3 were calculated using gcRMA normalization. Figure 3 
utilized MAS5.0 normalized data for improved data display. 
Gene Ontology categorization and statistical over-enrichment 
tests were performed using the Panther Glassification System and 
Bonferroni correction for multiple testing.'^ The raw microarray 



files, normalized dataset, and further experimental details are 
available from NGBI GEO repository number GSE46050. 

Transgene construction. The 35 S -driven short tandem target 
mimic (STTM) transgenes were generated by cloning the STTM 
sequence specific to each TE siRNA of interest into the BsrGI/ 
Spel-digested binary plasmid pB2GW7. The STTM sequences 
were generated as a 96 nt long DNA oligo (Sigma), ampli- 
fied using primers with homology to the ends of the linearized 
plasmid, and recombined into the vector using In Fusion HD 
(Glontech). 

The 35S:L'B/'7^-GFP transgene was produced by Gateway 
(Life Technologies) cloning the coding sequence (no introns) of 
UBPlb (Atlgl7370) into the binary plasmid pK7FWG2, result- 
ing in a G-terminal protein fusion to eGFP. The 35S:Z)CP2-RFP 
and 35S:G35/'-RFP transgenes were produced by Gateway clon- 
ing the coding sequences (no introns) of DCP2 (At5gl3570) and 
G3BP (At5g60980) into the binary plasmid pH7RWG2, result- 
ing in a C-terminal protein fusion to REP. PGR primer sequences 
are listed in Table S5. 

Microscopy. Dark-grown etiolated seedlings were grown on 
1/2X MS media for 7 d and their roots were analyzed using a 
Gl confocal microscope and NIS-Elements software (Nikon 
Corporation). The pattern of fluorescence accumulation was 
verified in 10 or more cells from individual plants. 

Expression analysis by qRT-PCR. Three biological replicates 
were performed for each genotype. Each replicate consisted of a 
non-overlapping pool of individuals. qRT-PGR was performed 
and analyzed as in reference 31: cDNA was generated using 
an oligo-dT primer and Tetro Reverse Transcriptase (Bioline). 
qPGR was performed using SensiMix (Bioline) on a Mastercycler 
ep realplex thermocycler (Eppendorf). The Atlg08200 gene was 
used as a reference in all qRT-PGR reactions. qRT-PGR primers 
are listed in Table S5. Relative expression was calculated using the 
"delta-delta method" formula 2-(AGP sample - AGP control), 
where 2 represents perfect PGR efficiency. All values were nor- 
malized relative to wt Gol, where wt Gol was set to 1. Statistical 
significance was calculated with an unpaired t-test using Welch's 
correction. PGR primer sequences are listed in Table S5. 

Cleavage RACE-PCR. Cleavage RACE-PGR was performed 
with the GeneRacer kit (Life Technologies) . To select for cleaved 
products with a 5'-OH, the protocol was modified to begin with 
ligation of the RNA adapter oligo to 5 p-g of the starting RNA 
sample. Subsequent RNA extraction, cDNA synthesis, and PGR 
amplification followed. Nested PGR was performed using the 
PGR primer sequences listed in Table S5. 

Analysis of pollen sterility. Pollen sterility was analyzed by 
Alexander's staining as in Peterson et al.,'^ except that the tis- 
sue was not fixed. Live anthers were subject to Alexander's stain 
for 20 min at room temperature before being analyzed by light 
microscopy. 

GAG protein accumulation by western. Protein was extracted 
from 0.2 g of fresh inflorescence tissue using 1 ml of extraction 
buffer (0.2 M Tris-HCl pH 70, 20 mM EDTA, 1.5 M Urea, 
and 2% Triton-XlOO). Quantification of soluble protein was 
performed by DC protein assay (BioRad). One hundred and 
twenty-five |xg/ul of protein was mixed with 1:1 suspension 
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buffer (100 mM sodium phosphate pH 7.5, 100 mM NaCl) and 
2X SDS dye-loading solution (200 mM Tris-HCl pH 6.8, 4% 
SDS, 20% Glycerol, 400 mM DTT, 0.2% bromophenol blue) 
and boiled for 10 min. Protein was run on a 12% SDS-PAGE 
gel and transferred to PVDF membrane (Millipore). We raised 
a mouse polyclonal antibody against the Athila GAG epitope 
GDKAHQWEKS (Abmart), and used this at a 1:500 dilu- 
tion. The ACT-11 antibody was used as a loading control and is 
described above. 
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